1 



fiOCOaENT BESOMS 



MD 200 608 



STITOTIPN" 



TH BIO 0 30 



r 



figures lest 



^ DiacEiPioms > 



iDiSf irxiis 



N^v Direct^ona in Matching .Fattiliar 
laaearch lagulting Proa Scoring and 
.Heat Virginia State Dapt, of Education,, 
Charleston- . * . 

feb .79 f ^ 

S6pi.i Paper presented pt the Eastern Educational 
^esearc^ Association Cenferenee iKiawah Island, sbv 
ffeWuary, ia?9); , 

' J . r ' ^ ' 

Mf01/FC02 .*iui Postage. ^ ^ ' 
•Comceptuai-'Teffipos Difficulty <Le?al i *lteffl Analysisi 
*lesMrcl Bet^odologyi *acoring forfflulasi- I^st Iteasi 
*Test^fieliabiiftyi' *rest Validity 
* Mate ting Familiar Pigurefe Test (Kaganl 

J 



rhe. problem ettgendered by the N|atoiiing familiar 
P^gur^s <Mff) lest is one of Instrument integri'ty . 'li*is 

delimited fey Validity, reliability, and utilliy of Mpf as a .measure 
of the refiectiFe-iapulsive construct* Validity , reliability and 
Titllit'y of ccnstr^ct assessment may be improved by utiliiingi (1) a 
f^gtotypic acoring model that will enable de^eUopmenu of Mff normsi , 
atttf it) ;itea analyses (performed on* MPP test itemal results irhicb ' 
will ^^veal good tea^t^items, reveal defective teat items, provide a 
.griLphic display of item performance, explain the origin of the 
currest imbrpglio about oMFf test reliability and validity,, and^ 
indicate steps necessaiy to enhance MFP instrument .^and Assarch* 
lategrity. 4tbe^aip4lsive-deliberative score Clft Score) is discussed 
as. a potentially bettbr scoring procedure th^n 'the double median 
*^split procydur# for users of reflective*iiipulslve category' 
informatloflw Despite the limitations and inchoate mature of the 
research presented in this paper, ^ it would appeat that resear^i 
'directed along similar 'lines vould be in €h#^ best ^interests of the 
scientific method, (EL) • 



* Seprodu^ions supplied by EDB'S are the best t*hat c&n be made * 

♦ ^ , from' the original document* ^ - * 



New -Directions in Matching Familiar Figures^ 
" Test Reslarch 'Resulting FfDm Scoring 
. ^ and Item Analyses 



Raymond J, Brinzer, Phi'D* 
West Virginia Department of Education 
Charleston, West Virginia 



"FiaWISSlON TO REPRObUCi THIS 
MATiRtAL HAS iif N GRANTeD BY 



TO THi IDUCATIONAL RESOURCES 
. INFORMATION CENTER {ERIC)/' 



Paper presented at' the Eastern Education|l Reiearch . 
Uiociation Conference^ Kiawah, Island, South* Carolini, 
Februa-ry 1979, ^ 



I 



■J - ' 



4 ^ 



^ Prior, to Lnlroduction of the _Matchii!| Fafniiiar Figures Test 4MrF)'by Kagan 



^ = R^smun, Day, Aibert/ and Phillips * (1964) , classiffcateoh of ^subjects ag^ reflet- 

, ' ^i^'^ impulsive styled; was aceoniplished by a variety. of subjective nie^ures 

lHartshorne,.^Iay &'>laller/ 1929; Suttpn-Smith & Rosenberg, 1959). Introdiicticn 
of the provided a^method for qTiickly and objectively determining sub- ' ^ 
j^ct's cpgnitive^style. ' ^ . \ . - ' ' . 

The'HFFus bes^t -by reliability , validity , -and^ uHlity probel^ms. A' 
. :^arvev of ,tHe literatHre* rwe-als the nature of the problems, and a niore= in- 

} teasive 'examination reveals their source, 

, . : Although thle/UFF has been dn use for^p'r^imately IS.y^^ars, little = 

measurement analyH^has been, noted in the l^ratuce. Thif, situation eiidures ^ 
despite the app^eacapce^ of a . considerable ^MFF .research effoit^ For ekample^'^a^^. 
LRIC search Coiiducted ^ Hr September 19/7 revealed .a. t^tal of 102 studies^ listed, 
under the conceptua.1 ter^o descripti-dn. Of the ^ 102 studies approKimateiy *two 
ma jor fetuMfes were listed as dealing wTth^ instrumeiit relibility as a major - 
^^emph^sis, andveight ma^of studies with validity in the same light. ^ ^ 

Andnfensive review of MFF research literature covering the Educational 
, Resprch In^, Psychological Abstracts, and Diss&rtation Abstracts revealed 
^Vthora of ^ MFF research. Htf^^ev^r, not onja study to date has been noted 
,- that deals with the basic integrity- of the aS determined by a classical 

- measurement-approach^ to t*he instrument's behavior. # ^ 

- \ ,: ^ ^\ ^ ^ " ^ 

The^ reliability ^d validity problems, of the ^ffF,are related, in part^ 
^ t(^the s^.oririg and classif icati-on system (error rate-response latency/double 
^ . n median sp^it) used in'operati^onalizing the reflective-impulsive classification 

construct.' In iturn, the scoring and classif icatfontsystfem has imp&^ed standlrd- 
iKutio'n of the instrument . ' * C ' ' * 

.1' . ^ ; -^^ , \ 

. ,\ Salkind Ql9'7J)*as recently moved to develop norms and a scoring' model 
■ \ CSalk.Mid & Wright, <1977) •for the ^JfF instrumen.t. His completion; Qf a study 
_ designed to produce mprtnatlve irffdrmation.for the MFF cpnstitutes^ a ^significant 
step In MFF, research. The step, however, constitutes little more than an 




. ^ . _^ ^ . ^ „ the ' 

classical measurement sense^ h#ve been omitted. " ^' 

A pervasive theme begins to emerge when one; considers the fffF'^in the 
perspective of its research , 'and development. Jhat theme is characterized by 
a field or* functional fiKedness that is typified by movement Iti a consistent 
direction set by tlfF instru^b parametars and concepts established by. Kagan, 
et al., In MFF instrument^^velopfoent Wnd scoring. This theme' leads to the 
current^ problem in MFF r^i^^rti.^ ' ' 



^The Ranclom .H ouse biction^ry^ of the E nglish Language defines integrity as the 
^ ^ state of being whole, entire, undiminished , sound ,"~umimpaired , or perfe^ in 
. condition. € ^ . . - . ' * 



s ' 0 . ■ 

eric; . V ' . ^> 



SPdteriient of the problem ' ' - ■ • ' , 

" . " " """" ■ ■ * , J ^ ■ • 

• _ The problem engL-ndered by the MFF situation is a highly cateplex one. . It 
IS, specifically, one of instrumeht integrity. Instrument integrity is delimited 
by validity,^ r'eliability, and utility of the MFF instrument ai a measure of the 
reflective- impulsive construct. No action' has been taken to /date, in order t'o • 
rectify the MT problem situation. Analysis indicates tfealJche solution to the 
problem lies in a- maximajly effective ^ffP*i'nstr■umeIlt , .scoring, and classification 
system, Little issue wourd'.appear related to the construct itself. For sup- 
porting .evidence review Hartshotne, May, Mailer (1929); Cattell (1537), Murray 
tl93h!; rolansky, .Lippitt, and Redl (1950); Sutton-Smith, Rosenbere (1959)- and ' 
Kagan (1964, 1965).. . . ' • *- ■'' »■., 

Objectives / * ' % . • 

Objectiyes for this research endeav#r^^re set-^ the nature of the problem. 
Specifically, they were intended to provide a d^^prtion to problem sQlution 
resulting m the highest validity, reliability, and utility^ of construct as€esr^- 
ment possible. ^ Attainment of the foregoing may best be attained through the 
following: ' ^ , ^ ' I - . 

1,0 Introduction of a prototypic scoring mod/l .that will enable development of 
• -Matching Familiar Figures Test (fffF) norms. . 
.^Additionally, the model should^ ^ , ^ \ ' ( 

\A Increase MFF efficiency.^ ^ , 

1.2 Provide improv^d*^ffF test reliability, , 
4. ^ 1.3 Provide improved MFF test valiait/. ^ : 

1.4 Provide 'individual test admiiiistration and^interpretatton 
capability, ^ * 

' > ' 1^5 Solve soflae of , the ^problems ind allay many of the criticisms 

stemming from j^e of the current scoring model. . * ^ 

Z.O Presentation of item analysed results performed on the Wl test items. 

Item ana^Lyses Results should: - , _ * ^ . 

^ ' ' ' >. 

2.1 Reveal the good test items/ , ' ' 

1\2 Reveal the defective test items. - ^ - ' • ^ 

' 2,3 . Provide a .graphic display of item peAmance/ ' 
^2.4 , Explaia the origin of the current imbrogHo about MFF test;j 

reliability and validity. * * 

2.5 Indicate the steps that must be taken in order to eqhanee ffl'F 
instrument and research integrity. ♦ 



X 



V 



Review ai 



f the Literature " 



As mentioned previbusJy in this paper, a limited a^no^nt of*research ^eals ' 
With ttie reUabiUty and validity of the MFF Test, that research Hall and 
Russell X1974) resArched the c jive^ gent and' oonvergent* validity of conceptual 
tempo.- The researchers found that no divergent validity existed foV^ conceptual 
tempo on the MFF, Word Recognition Test, ^aven Coloured Progressive Matirices, 
^and Peabody Picture -Vocabulary Ttest/ This would tend to indicate that the trait 
IS gen^ralizable* across tasks, as ^consistent response time ^tendency emerged on 
the researched tasks. Reliability for errors (number correct) bn the- ^ffF was 
reported lowest of the four Instruments used in the study, with mean improve- 
ment being less than one item: (the fIFF was IwMt here also). The authors 
repa^rted that the' low ^liabtlity; questioned the double- median split classi- 
fication precedure. . - 

Block,^ Block, and Harrington '(1974) reported on the MFF Test as a measure . 
.of* reflection^imptilsivity: The authors reported that Kagan defines the concept 
in narrow terms, but applies it dn a broad general* sense ^ Additionally, thev 
indicated the "the evidencjs for the construct validity^bf thji MFF was sparsX 
often' inconsistent, and sometimes irrelevant"* (p. 6l2)\ The author^ indicated 
that their intent was to: ' 

, . ^ ' ' ' / ' ' 

describe the discrepancy between. Kagm' s cdftceptualization . * J ^ 
. ..and his dperationalization of ref lecttfon-impulsivl^y ; ... assess 
/ the cojis true t validity of^the MFF . .and _ . present a represen- 
tative portion of data ... tha^ bears on ... the ^£PP 

situation (p> 612). ' ^ ^ - 



AuLt, Mitchell, and Hartmann (1976) repotted that Kag'an's original »reli- 
at^lit|. assessment was listed at ,62 for Jatencyj while'error score reliiibilities 
' wfre crted in the ,23' .43 range. Although the authors stated^ that the low 
reliabilities could be due to a cognitive tempo stability lack, it would appear 
that the item performance of the tffF would; account for a considerable degree, 
of i-nstabiUty, Readers intferested in reliability (test-retest/internal con- 
sistency )^&e referred to the article for an intensive discussion. The research- 
ers clo|^ with the statement that (fffF's) ^'validity has been demonstratea over 
a wide variety of tasks which measure cognitive development" (p,^230). The 
researchers recoipmend larger sample ..siies^ appropriate research designs, and 
statistical treatments as methods capable of making woric witH thte present form 
of the test possible. ^ - ^ 

Egeland and Weinberg (1976) investigated the piychometnic credibility of ' 
the fffF. T^ey reported reclassification differeaces favoring reflective subjects 
with. an 80/90 percent reclassiflcatipn rate and a 50/56 percent rate with im- 
pulsives for one. second-grade study. Other reelassif ication information was 
'provided, ^nd readers are referred to the article* for 'a more comprehensive, 
treatment. Additionally, the researchers cited Block, et al' s . charafcterization 
of impulsives as^ fearful, ifthibited, as consiitent with their own interpre- * 
tation. The authors stated that "the findings raise issue with the typical 
practice of labeling subjects solely bn the basis of Test data'V (p. 489)* 

The authors^recommended use^ of a linear time-error composite rather than 
the typical nonlihear approach inorder to avoid inherent double mediap split 
ijtisclassifica'tlon problems, In closing the researchers wrote: ' - 



ERIC 



* While one might question the premature acceptance of the MFF as 
a psychometwc procedure for operationalization'bf the reflection- 
= impulsivity construct, one might also urge caution and restraint ' 
ill prematurely rejecting the test as an operational measure of 
reflection-impulsivity because its psychometric underpinnings 
'have *been uninvestigated (p. 490). • 

Salkind (1977) introduced Qorinative tablei at the 1978 AERA CDnvention! 
The normative , information included descriptive data, means", and staridard 
d^iations for errors and response latencies by age; correlations of errors 
and latency by age and sex; and percentile rank inforniatioii. The norming 

.populaUon ejicompas&ed the 5-12 year-old age range. Salkind- s undertaking 
constitutes a crucial' step in the WTF development as a measure of reflectioa- 

. impulsivity, however the step preceded an array of more fundamental steps 
necessary to increase instrument integrity prior to normalisation. Salkind's. ' 
undertaking was agijnif icant _step beyond the functional fixedness pattern of 
muGh of the £XisfingllFF research, and should ultimately engender a significant 
contribution to instrument integrity. 




J 



ERIC 



f 



6 ■ • 



, Scorin£jheJffF 
The Double Median Split! ' ^ 

MFF test res<il,ts have been consistently scored by the "double median split' 
' Pp^Vpr" ^^"^^^^ by Kagan. This procedure typically involves adminstering. 
tho MFF test to a .group, or groups-, of subjects, then ranking all response 
latencies fromaowest to highest, and all error rates from lowest to highest 
Ihe median (P ) for the respotise inencies is then calculated. Likewise the 

. median error fate is obtained. Then, each subject's test results are examined ' 

, to determine classification as reflective or impulsive. Typically, 35 peccent 
of a group is reported classified as reflective, 35 percent impulsive, while 
the remaining 30 percent is unclassified (i.e., fast accurate and slow in- 
acdurata; Hall and Russell, 1974, p. 933). Fast accurates and slow inaccurates 

^-ifre those subjects who fall above the median on response latency and error 

^ rate, or below the median on both measures respectively. 

A variety of characteristics may be attributed to a double median split 
scoring proce^uce. Some of the characteristics appear positive, while others 
appear m a m^re negative light. Only the'-nfo^e salient negative, characteristics 
will be discussed here. Typically, they involvi the following:" • ■ ■ 

.1, Measures are group dependent, i.e., relative to specific groups. - 
Technically, a group of reflective subjects could', by virtue of 
'.individual processing differences, be artifically classified as 
reflective or impulsive. A casa in point would be one in which 
several classes are independently classified as reflective or ' 
impulsive. All reflective subjects from the several classes could 
' - " then be combined and the double median split procedure applied. The 
reflecti^es could then be •classified as reflective or impulsive. 
^ Additionally, unclassif ieds could ostensibly achieve new classlfi- 

catory status -via the double median split procedure. This classifl= 
cation variance would appear to have serious Implications for the 
double median split ,sfcori|ig procedure. On the other hand, a standard 
scoring procedure mitig X's, Ranges, SD's, and an Index score com- 
bining response^at'ency Ind error rate via a ratio, would appear to 
preclude many prAlem attributable to the present; scoring system. 

2. -The double median split procedore assumes that a specific within 

gcoup distribution exists. This wiplicitly negates the possibility- 
of the construct being normally dj.strlbuted' within the population, 
. and^atyplcally distributed within specific groups. 

-3' .Sex, age", SES, and other performance differences have been reported 
* in the research literature. The double median split scoring pro- 
cedure woiild not appear to dflmonstrate the potential capacity to 
systematicaUy treat these differences, .as they would tend to be 
offset by the groups themselves. Development of specific normative 
data For these groups, on thfc other hand, would appeaf to place them 
. in a more appropriate classiflcatory perspective. For Instance,' 
performance differences by seXj race, or SES might appear more' 
salient, and valid, from a classlficatory perspective when these , 
. factors are. controlled. Additionally, differences between or 'among 
g^roups, e.g., by sex, might be more valWly attributed. to specific 
group characteristics or ^performance. 



ERIC 



Measures may v^ry considerabry. The double median split scoring 
procedure may result in a considerablf variation in classification 
as a result of seemingly inconsequential score differences * r For 
example^ the classification percentages for the research data used 
as the basis for this paper demonstrate the following differences 
for the 6-8 grade levels (see Appendices A, andp for specific 
data^ * * ^ ^ 



^ , Table 1 ^ / ' 
* 

Double Median Split C4a|atfication 
Variance for a Liniited n 
at Thrjee Grade Levels^^ \ 



A 



Clasiif icatibn 



^^ade ^ ^ ^ Refiectiva * Impulsive Unclassified 

normal upper 6 15 7 46.6% 7 46.6%^ 1. 6.68% 

limit of the 7 * 21 9 p. 8% 10 47.2% 2 9 5%" 

MFF Test ' 8 33 ^ 9 27.2%^ 13 39.9% 11 33,3%^ 



4^ 



see Appendices A^ Bj and C fdr particulars. , ■ ^ 

^differences may^constitute a defect in this calculation (e/g,.j 
the extremely small n for Grade 6) never theless , the objective 
here is to demonstrate a potential defect. This, defects would 
^ possibly be amplified due to n variance* 

The reflective variance demonstMted across the 6-8 grade levels raQges 
from 27.2-46.6 percent. The impulsive variance 39,9-47.2 percent,- while unclassi- 
fieds range from 6,j68-33,33 percent. The implications of this variance (19, 4^ = 
7.3, and 26.6 percent respfectively), attributable to the double median split 
scoring prqcedure, ought to be fairly obvious. It would appear that this aspect 
of scpring constitutes a significant portion of the MFF reliabilitV/yalidity- 
^ imbroglio . ^ \^ 

5. Time expenditures for scoring are considerable in the case of the 

double median split. Time economy will be discussed later (ree p. g). 

' 6^ Due to individual^differences, score variability^ and specific 

characteristics of the double median split group.-relatlve^ system 
individual administration the tffF is not possible. It appears 
that scores must always" he related to the specific groups/s, . ^ 



7, An analysis of the state of , the art concerning the MFF^ would appear to 
'Strongly indicate that an attempt .to standardize the present instrummt 
is aubject to the limitationr discussed* in this paper and elsewhere. 
. ^ - * However,^ this is not an attempt to discredit such an undertaking, , as 

; ^ f the implications of moving in this direction ^a^^e in themselves moment-' 
* - tous. Additionally, T^brndike and Hagen (1977,^p. 94) have stated. that 

*-a test with relatively low reliability will permit us to make useful 
- ^ ' ' studies /of and draw accurate conclusions about groups", which appears to 

O , ' be the. case concerning the fffF. t " - 



ERIC • 



The^ID'Sedfe " ' ' ' ' • ' 

..-•^^ . f ' ' ' 

' Response latency 'and error rate are essential components of the reiiective- 
.inpulswe construct. Historically, respons'e -latency, and error rate have been 
treated via the double median split scoring procedure mentioned previdusly. " 
Ihe. group dependence, potential variability, and inadequacy for standardizatioh 
* oi the double median split procedure indicate that another scorirfg procedure 
would better serve researchers, psychologists, school counselors , and other 
potengial ysers of reflective- impulsive category information: An- ideal scoring 
procedure would have.44ri potential for individual admiuistration , standardi- . 
.zation, increasetf reliability and validity, time economy, and concomitant trait 
•^entification and anaLysis,. This" scoring, procedure would- appear to combine 
response latency and error rate into an index that could t^ be related to a 
rtassical measurement framework, including such asiiects as X's P,.'s, SD's 
SEM's and item analyses (power, discrimination, celiabiliti^S , aM vkidities). 
In this manner/a more effective MFF instrument couLd be d€veloped"cesulting 
in far.greatef reliability, validity, and instriiment .Utility-.(see Appendix G 
tor recommended selected iidrming controls). Such a procedure and results are 
made possible through the ID Score (impulsive-deliberative; named aftdr 
H. A. Murray, an early researcher in the area). " • 

The ID Score is the ratio,, Qf T response latency £o T error rate. It 'is 
©obtained by the algebraic formula: ^' 

, ' " . Table ;2 ■ ^ ° • V * 

^ ID Score Formura - ' . . ^ ^ 



ID Score 



n 



H 
n 



TRL 
TER 



a 



, or 



Sum Response ^ 

Latencies ' 

ni!jnjber of subjects 



Sum Error Rates 
number of subjects 



Total^ 
Response 
La^tency ' 



Total 
Error 
Rate * 



p.ref erred computation due to cpnvenience/rapidity ' 

The formula produces a score that would appear to be a somewhat better 
measute- of individual impulsivity-ref lectfvity . This 'is due to the group 
interactive, nature of the double median split procedure. Directions for ' 
calculaeion of an ID Score, and a facsimile ID Score sheet/iif actions, are 
located in Appendices D, B, and P respectively . , Examination 6f these 
appendices 'should give the reader a 'somewiiat better idea of t:he i^otential for 
scoring ease and standardization. that is characteristic- of the ID Score. A 
calculation using* actual data is entered, on the score sheet for review (see 
Appe;idix F) . . - ' • 

The IB Score e^cWing the asp.ect of a zero (0) b«se', linear trend, and 
open upper end,, would appear to have coniiaetable potential for normalization 



ERIC 



--2 



below ten 



Ear]„y research results indicateWhat impulsive ID Scores |enerally range 



9 



and resolution of many of the problems presently attributable to the double 
median split scoring systenr. A considerable amount of research will have to 
'be conducted with this procedure in order to build the research groundwork 
necessary to soundly establish the appropriate standardization process and ° 
illuminate the potential pitfalls. ' , - 

- ' * 

Increased MFF Efficiency , ' ^ 

Test efficiency may be increased in a variety ©f .ways. Additionally, 
efficiency ought to be approached from different perspectives, such as examine r^ 
examinee, instructions, procedures, scoring,- &nd results interpretation and 
use. Some, or all,, of- the preceding aspects would seem to have an effect 
upon test efficiency, and in Gombiftation that effect might tend to be dramatic; 

Efficiency is viewed here from a scoring perspective. Consequently, data 
recording, computation, and ti^e economy are primary considerations on the one 
hand. On the other hand item function wuld appear to be involved in test 
efficiency however it appears more ^properly relegated to the reliability 
and validity realms. ' Subsequently, efficiency is synonomous here with utility/ 

The smallest group (grade 6, n ^ 15) was, selected in order to get an idea 
of MFF sco^ring efficiency, following are the effidiency particulars from a 
scoring perspective for one versed male scorer: . . 

Table 3 , • / 

Scoring Efficiency Information for the Double Median Split and ^ 
^ ID Score Systems Respectively 



Sys tern ' 



Tabulation 
' Time 



50 Time .Classification Total 
Time - Time 



\ " - 
Efficiency 
Index 



Double 
Median 
Split 



15 



ID 

Score 



tretabulated 
(includes post- 
ing RL/ER*s and 
determing . indi^ 
vidual 
Total 



15 Pretfbulated 
(see above) 



44" 



17*^" 



6' 61" 
C361") 



1' 



59"* 



2" 59" 
(179") 



2.01 times 
as^ long as 
, the! ID / 
Score 
method 
(i,e. , 
twice the 
time) ^ 

.495 or 50% 
of double 
median split 
scoring time 
(i.e., 
the *imeO 



Substituted a ta^le similar in format due' to the lack of developed , tables. 



Virtually no conformity was. nqted in the scoring area other than use of 
the. double median split. ^ ^ » 



4 



10 



. .: ^'^^is of this limited trial conducted, by the .researcher- it -lppears . 

that the II) Score system is considerably more economical in terms of time expended 
in the scoring process. Additionally, it is hvpothesized that, as the n incrpases 
the time advantage in. favor of the ID Score will become even more pronoUnced ' 
due to the physical limitations of the double median split procedure. It may 
take fully twice as much, or more, time to classify siibjects using the double 
median split routine. .4n e.xtensive research investigation ot time expenditures 
in both systems would appear- to contain more definitive answers to any questions 
raised here. ■ . ■ - ^ . •, 

_ » 

TeHt Reliabilit y and Validity 

A test can have extremely high reliability and little or no validity 
Flowever, a test cannot he qua 1 itaCively valid with low reliability. Re- 
Lidbility IS considered a necessary quality to validity. 

. Problems concerning the reliability and validity of the MFF text have been 
discussed earlier in this paper (-Hall & Russeli, 1974; Block, et al 1974- 
Kagan, 19b5; Egeland & Weipbergf 1976), as well as elsewhere, e.g.,- (Kagan'& 
Messer, 1975). . ^ ^ • ' s r l « 

Although reliability is necesSary to validity, relatively low realiability 
. ^ such as in the case of the MFF, does not disqualify a -psychological construct 
Keiatively low reliabilities, however, dictate the nature, of related research 
and justifiable interpretation .of results (this concept was briefly covered in • 
the section "The Double .Median Split," p. 6). For instance, Thorndike and 
Hagen wrotf that (partially quoted earlief) 

a test with relatively low reHability will pettmit us to make - useful 
studies of and drawkaccurate conclusions about groups, but relatively 
high "reliability is required if we are to have precise information 
about individuals (p. 94). - , 

The current HTF scoring system would appear' to have little value for individual 
difference research. However, -increased reliability 'and "coYisequently, 
J validity " sh-Quld improve the quality of research findings and genera-lizability 
regarding groups, as well as enable justifiable re'searth and educational 
decisi,on-making in the area of individual differences. ■ 

Consideration of Kagan' s (1965) original reliability assessment data in 
the perspective of individual and group reliabilities indicated; that the' -62 
response latency finding reported and the .23 - .43 error score range, would 
have the following implications for RercfiHr of reversals with repeated testing 
(i.e^, retesting): " , • ■ ' 



Table 4 , ' 

^Approximate Reversal^ 'Percentage Chance Figures for'MTF 
Retestitg Using Selected Reliability Scores^ ' * 



Category ' Sing le Individiia|Is" ' -.Groups (x' of 25) Group (X of 100) 

Riiponse e.62r^ . 1/83.3 or iJ l : l e ss' t h an W25Q0 

Latency 1/3 or 32.5% . ^ or less than :04% 

Error f .23R = , - . - » 

Scgre 1/2.19 or US.iyt ' 1/3.28 or 30:45| 1/4.05 or 24.65%^ 

^ p . 43R' = . 

,2/5\or 40.3% - 1/9.17 or 10.'9% 1/142.85 or .7% 



ERIC 



^ reversal, i.e.^, chancrfor being classified r&flective, impulsive , or ' 

j^uiiclassified one tij^B and changing classification upon retesting. 
^^Tharndike and Hagen source Cl977, p. 93). 

jKagan reliabilities source (see p. k), • . * ^ 

approximations .based on interpolation of the Thorndike - Hagan table. 
This procedure may be invalid. Tha^ intent is to communicate guneral 
implicatidns 5 not,exacc data. ' ' ^ .. - * , • 

It is believed that due to the synergistic nature of reliability and " 
validity— and the synergistic naturr of the response'latency and error rate 
m the measurement of ref lectivity-impulsivity in the double median split 
the cooperative action of the low reUabilities for response latency and error 
rate is sUch that the total deficiency in terms of reliability and validity 
is maximally less then. the low reliability- of the response latency taken 
separately^ and mnimally more than the error rate reliabilities taken sepa- 
rately. I.e., would fall sdmewhere in -the middle area Ccirca .475, the midpoint 
may be too high an estimate due to the peculiar relationship of. these reliabil- 
-ities). Consequently, an idea of the nature bf the reliability- validity con- 
troversy in the >CFF arta can be gained, and the necessity of approaching the^ 
problem in^a classical measurefnent fashion, and the Matter's implications for 
improving test reliability and validity, apprefciated. 

Item Analyses Results \ ' . 

As stated 'earlier, no item analysis results dealing explicitly with the 
^ffF instrument have been noted in the research literature. The M? test was - 
iodividually administered by the researcher in this particular instance to 
a select remedial -reading popuaation drawn (n - 227, or 10%) according to the 
following criteria out of a total^N of approximately 22010^ Remediation 
classification was based upon a, qualifying score, on the Metropolitan Reading' 
Readiness Te^st in Grade 1 of low C, D, or E and teachei /simervisor judgement; 
and a cummurative deficit of three months per grade level (e.g., 1.7 grade 
2, 2.4/3, 3.4/4", 3.8/5, 4.5/6, 5;2/7, and 5.9/8) in the comprehension section 
of the Gates-jiacGinitie Reading Test over the 2-8 grade levels and teacher/ 
supervisor judgement. Basic MFF test results follow in Table 5, shown below, 

Table 5 " ^ 

^jesponse Latency and Error 
... Rrte Ranges for aA">Experiment with . * \ 

. an n of 227 [ \ 



- Response ' Error • 

Grade n Latency R ^ Rate R 



1 37 1/9 - 70.5 4 - 29 ' 

2 34 2.48 - 60.64 4 - 28 • , 

3 , 37 3.17' - (41.75*) (4)- 24 
\ 28 2.22 i 49.38 ' 3-23 ' ,. 

'%0ii^ 2,3 3.16 - 62.33 , 3 - 19 normal ifpp.er lim-if or'the MFF 

ef 15 5.0 ^- 40 .,77 1 - 14 Test (elementary edition) used 

7 21 3.85 - 73.5 27-17 in this study 

8 33 ,3.88 - 55,38 - 3 13 ■ 



^Extreme score -cut .(369.5 with 1 ervotj 



\ " Note. Scheduling, and other factors influenced the middle school sample-^ 
used. for this itudy (Grades 6«8). Actnallyv the information i/s iome- 
what differect than it appears due to the fact that approximately ^one- 
's;* half (two of foiit partlcip.ating elementary schools) of app^gKimately 
\ ^J340 remedial reading elem^itary subjects- representing all of the ' 
elementary remedial readyig subjects were tested, while all of the 
6^8 grade studerfts were tested. The 6-S grade student sample^ how-? ^ . 
^ ever, did not constitute the entire. middle, school remedial rrtdin^ 

popula^on. ; ' , i ^ 

I ; » . ' ] ■■ "' 

\ - * 

; Qfefiniqe trends become apparent upon eKamipation'of the data \in. Table 5. 
Those trends' consist mainly of a general Increase' in minimum response latency 
pver the gratie levels 1-8 (1.9-3.88) , an initial decreasr in maximam response 
latency o^engrade levels 1-3 (70.5-41.75), then an apparent Increase at the 
. intermediate level (49.38-62 '33) , Upper^e lament a ry or middle school ranges 
are somewhat erratic. Once again, it is believed that the low n (15/20 at 
Gtades 6/7) may contribute to this display. ^ ^ . . ^ 

An examination of the error rate range Indicates that, generally speakiTig, 
. thfe njinijnufn-nuniber of errors decreases over 1-8 grade" level range (4^3) 
^ while iihe maximum number of errOrs decreaifes also (29-13) / The trends * 
^demonstrated in this data would appear to be in line with expectaacy. An 
inereased n in all cases would appear to be'a necessary factor in future 
experiments along similar lines. Additionally, -the clear ^trend for decrfas- 
in% response latencies with age would appear , to be somewhat contradicted by 
this *partf'icular set of tabular data. It would appear that cognitive maturation, 
in part might contribute to a decreasing response latency with age, espeqially 
in the case of reflective subjects. The increasing response latency - age 
relationship is readily apparent o^i^ in the case of low response latency 
Subjects for this particular data, ' , . 

r Good and defective^^S^-^ems . An item analysis performed on individual ' 
items across the 1-8 gyade levets TweaJ^s a variety of item perfd^imnces*' In 
-viewing item difficulty levels, eyg., a.llffMi^ pieasurejnent approaS^ would r 
Indicate that items functioning systematically cS^W^e expected to d^onstrate 
.performahce^in the * 30: - JO range. Furthermore, thJ^'H^e of this peMormance 
pugtit to be positively linear if the itemsvare functioriing effectively^ Ind their 
performance is related to cognitive maturity * 

: Startling resUltp emerged from the initial item analysis performed '^T" . 

the test da^a. . These results indicate that approximately eight of the * items 
{#■ 1, 4, 5,. 5, 7,;8, 9, and 12) Mre^d^fective^in this particular instancfi, 
while four ^f the ;items ,(2,j 3/ 10, and 11)^ may be termed good items. 
Defective items ar^b aefined as those items 'thatr ^ 

^' :^ ^ . . >^^^ - " y " ' ' ^ " ' ' : . - ^ ■ 

U Are too easy such as item #5 whicb demdhstrates a range of ' \ 
61 - 91 percent over grade levels; 1-8, 
' . ■ ■ _ ' ■ ' - » 

» Are^too hard -- such as i-tem #12 which demonstrates a rdnge of ^ 
31 ^ 27 percent ^^^,gg«ddfi^eye Is 1-8. , ' 

3. Demonstrate an unsystematic or sporadic slope, such as item #4 
- which tr^averses a range from 33 percent (Grade 1) to 62 perceht 
XGrade^3) to 33 percent fGrade 8), ^ 'J 



V' 

\ 



On the other hand, good items are defined as thos:e items that demonstrate 
systematic s.lope across the grade levels', suctvas. the slopes demonstrated by. 
items 2, 3, 10, and 11. ^. ' . . : *, ' ' » 

/■ ■ " ■ , ' • . ; 'V " ■ \ V. ' ' ' 

A tabular display b£ item alialysis . results Cdifficulty level) that is 
suppoftive of the precedin|; item classifications Is iacluded in Table 6. 
Comments ab^ut the genferal nature of the items are included in the comments 
column^' ■ ».? t.-i. ' .' ^ 

' ^ 'I ■■ ' ' ^ - . . V " . 

Table 6 . • * • 



MPT ItemJ^fficulty Data 



/ 



^ General jt ^ ^ 

Comment , Grade = 



Too hard 
6ood^ 

^Good^ 
Poor 

Too easy 
Too easy 
Too hard 
Too hard 
Erratic 
Good 
dood 
Too hard 



Item #1 
2 

3 

. - 4. 
. 5 

^ 6 
7 
8 
9 

10 
11 

12 




normal upper limit of the fffF 
Test used in this study 



Item performance graphic display > Perhaps the most dramatic description ^ 
of sfitcific item performance for this data carf^ be obtained by graphically ' 
displaying each item. For .this purpose an ideal, or artificial, item curve ^ ^ 
has been yncluded. Although developDnentaf trends ds not always follow the ideal 
or linear^ it is anticipated that a rather systematic slope ought to be the case 
in terms pf .effective item functioning and developmental differences. This is 
indeed th^ case as confcerns the good items. However; the performance of . the 
defective items appears^ rather self-e^lanatory/ Ideal , good, defective., com- 
posite, and comparative' Cgood ^ defective ideal) item curve tables* are in- 
eluded according to the 'followwig schedule: , 



. ,1. 


|abl% 7 - 


ideal item curve for eight grade levels (p. 15). 


2, 


Tpble 8 - 


ideal item .curve for siK grade levels (p. 16). 


, 3: 


Table' 9 - 


defective item curves (p. 17). * ^ - 




Table^lO - 


composite defective item curve (p, 18). 


. 'S. 


Table 11 - 


good item curves (p. 19);: ^ 



• 6. Table 12 - composite good item curve (p.- 20) . , 

r • / ' _ • " V. ■ , ' ■ ' , ' .' 

7. - Table 13 - good defective jtem curves f^jr* Gomparis«n/ contrast " 

' (p. 21) . ^ . * '/ ^ • 

8. Table 14 - percent differences f gooi - defective - ideal items 
■ ^ (p.; 22).. : - ; / ■ = 



The. tables' follow (spe p^., 15-22). 



IS 



1:^ 



15 



• TaMe . 

■ .Eight Grade Level 
Ideal Item Curve^ 



■J 





100 




90 




80 




70 


Percent of 


r 


Suflject^ 


60, 


Getting' 
Item 


50 


Correct 


' /40/ 






30 








" Grade Lev6l 



Range, = 3U - 70 percent , ■ , . ., 

Rate of change = 5.714% »er grade -level* 1 

Grade, level/pepceat' =1/30^ 2/36, 3/41,, kjia,/^m, 6/59, 7/64,: 8/70 



^computed rndependen^ of developmental ittrges 
normal upper liml^ , 



ERIC 



^6 





100 




' ^ 90 


i 


80 


/ * 

-.- 






' 70 


Percent of 




Subjeets 


60 


Getting 




Item? 


50 


Correct 






40 




30 








20. 




10- 



7 



. . .Table 

Six Grade Bevel 
Ideal Item Curve 




■"""\ 



6' « 7 



Grade JLevei 



Raflge = 30 - 70 perceajt : , = , , . 

Rate of change - 8 percetit^ ' 

Grade level/perce|t = 1/30^ 2/38, 3/46, 4/54, 5/62, 6/70 



ERIC 



17 



i - * ■ 

5-- 



■J 



J 



Defective Item Curves 





100 1 




^ 90 




■ 80 

* . 




70 


Percent oT 




Subjects 


60 


Getting 




Item 


50 


Correct ' 






40 




30 


i 


: • 20 


f 


-10 ■ 



Defj 
Grac 



17 




33't 51 percent »s ' ■ % 
-ive jtetas =^1- 4,,5, 6^ 7, 8, 9, 12 
level Mng^/percent =1^6/33-58, i.-34/33-37i 5-aW-51, >8/5l-51 



* - 



er!( 



c 



{■■ -si 



i » 



18 



..' Table ^10 

^ompb"site Defective 
♦ ■ Itiem Curve 



Percent of 
Subjects 
'Getting 
Item 
Correct 



100 
90 

70 
60 

f 

; 40 

30 ■ 

20 

10 




3 4 5 6 

Gtade La^el 



7 'b 



Ranged 33 51 percent ' ' , ' * . , . 

Defective iterts = 1,V4, 5, 6r 7,-8, 9, 12 . 

.Graide level ranges/percent = 1/33, 2/36,, 3/4'3, 4/37, n;5/47, 6/58, 7/51, 8/51' 



. • ft 



erJc 



.19 



19 ^ 



Percent of 
Subjects- 
Getting 
Item . 
Correct 



100 
90 
80 
70 
60 
50 
40 
30. 
20 
10 



Table 11 
Good Item Curves' 




3 4 5- 6 ■ 7 
Grade Level 



% 



Ran|e = 31 - 67 percent 
Good items - 2,^3, 10, 11 
Grade level^zaffiges/percent 



= 1-6/31-58, ;l-4/31-47, 5-8/53-67, 7-8/68-67 



20 



\ . 



j 



Table 12 

Composite Good 
- Item Curve 



20 



\ 

6 I 



1^0 



* 




; 80 \ 


Pai 


rcent of 


70 

% 


Sul 


)jects 


60 


Getting 




I:t€ 


m 


50* 


Correct 








40 






30 ' 






20 






10 




6 7 



Gfade Level 



Range fa 31 ^7 perceot -v : 1; . ' ; ' 
Range/Good itemi = 2, 3,- 10,, 11 % * , . , 
Grade leyel/petceatfv= 1/31, 2/49, 3/47 ^4/47;^ 5/53, 6/58, 7/68, 8/67" 



IRIC 



: ..■'24 



Table ^4 

Percent Differehce^ for 
Good - Defective and Ideal 
Good - Defective Irems'^ 



22 



Grade Level 



2. 



4^ 



-Good items ' 31 4§ 

Defective Items 33 36 

difference - * *2 13 

Ideal Slope ^ 30 36 

Good, Difference 1 ^ 13 

Defective Difference 3 0 



47 47 "53 
43 37 47 
4 10 6 



44 
6 

2. 



47 
0 
•10 



53 
. 0 
-6 



j^Jotal Differlince Good.*- Defective = -2 \ . * 
^Totil Differejice Goodv Ideal ^ +20 ^ . 
Jotal Dlf f arentft Defective - Ideal ^ -44 
' l>tal Difference^ Ideal - Good/Ideal Defective 




Totdl 



^ . -2* 



m +20- 64 - 



^ -44^ 



ERie 



/ 



23 



Origin oft the Current mi Reliability'- VaUdity Imbroglio . . 

_ It would appear that the orl|in of the current ffi-F imbroglio is attriTju- 
table to a variety of causes. Qne of the most basic cauies may r^ide in the 
lack of histocical knowledge about the construct itself. Indeed,' many 
researchers attribute the concept cff reflective-Impulsive response style to ' ■ 
u^ui-n objective operationalization of the concepl via - ' '^ 

« the MFF and its scoring system that" Is attributabU_to Krfgan <albng with 
a formidable body "of construct research and conceptual dgvelopment) Any 
discussions of validity and reliability, or the construct, must nec^sarily '* 
consider ^arly research and develppment completed on the topic shortly' after 
the turn of the century, and intenni^ttently down to the present t^me. This-, 
lack, of historical 'knowled|e foments a potential flaw for much if the con- 
temporary thought, reseaE'ch, and jtiticism of the reflective-impulsive con- 
struct. ^. , . \ ' 

Another basic cause ef the .current 'reflective-impulsive imbroglio would - 
appear to reside in the double median split scoring procedure discussed 
earlier. Much intensive research on the implicatioftS of the double median 
split* scoring procedure would appear to be in order. » 

• ■ 

.The MFT Test items themselves would appear to be latent "sources tff text 
reliability and validity problems. The potential authenticity. of this statement : 
increases when the i^m graphic displays presented earlier in this paper are 
considered, and the potential implications for validity and reliability are 
considered along the lines of reliability implications for reversals of score 
claisifiDationSj, reported by Thorndik's and Hagen (see p. 103. The impli- 
cations of the item. analysis performed here would apparuto have profound 
implications for the present form of the MFF Text itself, as well as- research 
conducted msihg this 'ins wument, and the] future direction of MT research. 

..St^s Necess ary M Eohance mi Text/Coristruct yersatilityj Research, and'integtity 

A variety of steps j?ould appear necessary when one .considers the scope and 
nature of the problems besetting the MFF area, Jnitlally, it appears — based ' 
upon the research and developmeht of Hartshorne, May, and Mailer^ igjS, Murray; 
1938, Kagan, et al,,- 1964, that the construct itself is sound. Any claims * 
concernidg or questironing the construct must assune the burden of disclamatqry % 
proof — which appears no mean task in face of the research evidence. °Cpn- 
sequently, many of. .the steps that must be taken have been suggested or stated 
m this paper, tacitly^ orsexpMcitly." They are a logical' consequence of • i 

problems raised or issues broached. A reiteration will be advanced at this 
point BO that a fram^or general perspective may be 'advanced. The components * 
of that .frame include a need for: ' 

1. A thojrough and comprehensive analysis of the historical development 
of the reflective-impulsive construct. This analysis must not start 

, ' ia the late 1950' s, or with Kagan, but should trace the development 

of the construct as fat as is .possible.', , 

2. A. thorough and scientific item analysis. of the MFF test items. This , 
analysis should include item difflcultyj item discrimination, , and » ' 
test validity. ' • / 

^ 3. Develppraent of*a;ne^ or remelon of the^ldj ^^Test using classical 

rteasuremmt principles, v 



ERIC 



' - ' ' ■ •' ■ ■ ' • ■ df 

Deve-lopment of a new scoring system. This systeni' ihoald provide the 

, .m¥ Crevised or new test.) with the potential for incrMsed reliability,^ 
validity, and utility (e.g.-, Individual admiiriitrktionT.- Additionally, 
this ^yitem should be economical ftom a time itand|plnt\. Such a 
system appears in Appehdices D, E, and F oftttispapet; Appendix 6^ '• 
includes a matrix and comments^ li|.tiag selected norming consideration^'. 

Development of norms for the revised, or new,' MFf Text. These" 
nops should likewise support individual administration £#4 preceding), 
and provide more substautive data* abQUt specific 'grqup and individuar 
characteristics. Specific reliability and validity" figures shffuld be V 
included. * The norms should % developed elasslcally. In a SD«format, 
with the SEM concept included. This may ultimately result" In, ' 
68i26 percent of a group. being class-ified as-.unclasslfied' (and ' 
subsequently possibly more amenable .to experimental^ treatment) ' . . ' 
while Jhe remaining 3-1.74 percent would falf in pro|ressively ' 
more reflective or impulsive categories,. .Thia approach- is almost 
a direct reversal .of thfi Houbra median split scoring procedure, r 
Note the following compaDlapn's: , . " 

. ■ ■ ■ ' . ' - " ■ ■ ' « • » 

•• . ' • S ' i Table 14 • ' • ' " ' . 

Student Jper cent Distributione . * ' 

for Tw6 Scorin| Syitems ' ^ ' ' 







*r 




Scoring System ^ 


■ Perceat Reflective 


Percent Impu4s^ 


. Percerit Uflclassif ied" 


Double Median 
Split 


; 35 


^ , r-. — ^ 

^ 35 




ID Score Fdmat 


. . 15.86 


• 15.86 


X ^8.26 ■■ " 


Scoring System ^ 
Difference 


. 19.14 

^ T- 


• . 19.14 


* -3i8.26 



ERIC 



Additionally, Appendix 6 includes a matrix and commettte listing selaeted* normint ^ ^ 
. minimmns. . , " _ - • 

6. ^ ^Comparison/contrast of the double median split ID Score ^qcedure^/ 

systems and their in^lications for research and practice. Tfiis in-, 
eludes the practice of neglecting response latencies past the initial ^ " 
. error* ? ^ ^ , ^ - ^ ^ ■. . , » 

. ■ ^ ■ ' ' * ' . • ■ - _ " ■* 

7. A review^ of taore salient early studies' in the. perspective of sub-^ V 
iequently established normative tables. - , 

' ■ . An analysis of MFT scbrlng trends in ordiy. to determine. if Inereas- * 

Ing receptivity or in^ulsivlty fs a cffaracterlstic of "contemporary ^ 
trends. . . ' ' , 

. , 9- Comparlson/correlaUon of reflective-impulsivr groupr^^^ ■ ' M. 

' ',' ^rWti/characteristlcs_in the area, of -cognitive style an^ r-- 

— ' ■ ^ Tte WW scoring system, would increase controls "and the opportunitv ^""^-^ 
{for such studies. " . . .. ' . 



25 



10. 



11 



12, 



£ilculation of reliabilities and validities for the double median 
split scorihg prote^ures as well as the ID Score procedure, ^ A , 
comparative analysii of the implicationir of both systems *ought to' 
add*substantive knowledge to the ref lectlye-impulsive construct 
research area^ ^ \' - , ^' 

Investigation of, the double rpdian split low error score reliability 
problem. (M ^^iS/T = ,24; Messer, 1970) in 'the perspective' o? ID 
Score reliability. The ratio dimension of the ID Score may have 
positive implications for score stability in the face of its " ^ 
response latency relationship. ^ — _ . , 

The growing movement towards' normallzatien of a£ test that is fraught 
with. reUability and Validity problems ought to be examined. For 
example ^ Salkind (1977) has 'developed norms based on eKlstiag ^ 
research \data . obtained during Inveitigations by. other researchers i < 
This researcher had considered, such a mom, but initial Item 
analyses; results obtainqfl dinring preliminary test dataanalyses 
Wf re con^idered^ufficient to preclWe is^ch attion,. Subpopulation^^i^ 
^ S'CQre differences , coupled with" low rf liabilities further ^onmdund 
ttfis problem, , ■ * ^ 7 \ 



13. 



Development of in annotated bibiiography, for the reflective-in^ui- 
^i^e a^a^ Such a bibllograp^ should include validity, reliabili^, 
re^dingj^ seXj and other differential factor study citations. 



- Lifflitntihqis 



A considerable number of MmitationS - eKist for this paper, 
morfe salient ones are: =■ . ^ / 



6ome ofv the 



1 . The JffF Test is used Interchange 
const'rufct/ ' . 



wi th ^ the re f le c t ive * impul s i ve 



The' depth of analysis Conducted here Has-been somewhat superficiai* 
Computation of item diswimination, reliability,, and vaMdity indices ^ 
along with other statistics, would seem to add'much valuable ihfor- 

matlbn qn_whict^to base judgemmfs. V . J , ' 

- - ^ ' , * ^ " ' • - . ■ _ 

-The; coraprehensive focus of tMs paper' is a limitatldn closely related 
to #2; « « 



4. 

5. 



f^e low n would appear to te a severe limitation. 



1 



Si^i * .r^^fidial reading, sample constitutes a severe limitation. 
' ' "r, if two-thlWs of ^the^^pT Test does not function, sygtema- 



t^" 



J — ^ for a population of Chip nature it Mould' appeat that 
faerali2ability,to ^imiliar groups would be highly suspect. * This 
asj^Act ciombines with #'s 2 and 4 to 'preclude such an undert^ing 
with- this data .' ' - . » . : 

Thfe selection process ^or Gra'des 6-a-,donstitutes a deficiency and - 
may account' for some" of the erratic dataxat these levels. 



26 



The zero point - open end of the ID Score comprises a problem that 
needs investigated^— distributions will be ^positively 
skiwed. The nature of the responle latency ^ error rate 'rerationship ^ 
however, indicates that this is not a serious deficiency if a 
deficiency at all. 



8, Inclusion of- Grades 7 - 8 for a test that has a normal upper limit 
. . , of 12 years of .age is 'enlightening from one perspective, but a 
^ ' bona fide limitation from another* 

Despite the Jimtations and inchoate nature of this research and its 
findings it would'* appeat that research directed along the lines suggested 
in this paper, and by other researcheri, would be' in the best interests of 
the scientific method. This is especially true when the current state of 
the art is viewed in the perspective of a classical measurement approach 
to sound test development. 



ERIC 



37 



27 



Random ordinal 
scores (unrelated) 



^ppendiK A 
Grade 6 

n = 15 Related scores 



Response 


Error 




Response 


Error 




Latency 


Kate 




Latency 


Rate 


Classification 


/ c 

HO 


1 




\ 

\46 


23 


Impulsive ^ ' 


46.5 


3 




73 


11 


Impulsive 


52 


4 




1^4 

110 


5 


Reflective 


4 




10 


Unclassified \ 


73 


4 




194.5 - 


1 


Reflective 


92 


5 




112 


3 


Reflective 


100 


7 




- 92° 


■ 9 


Impulsive 


104 
110 


8 
9 


Median (Pggj 


163 
46.5 


4 ♦ 

8 


Reflective ' 
Impulsiye 


112 


10 




■ 157 


7 


Reflective 


116 


10 




100 


14 


Impulsive ' . * 


126.5' 


Il- 




116 . 


4 • 


Reflective 


157 


ia- 






12 ' 


Iffipulpive 


163 


14 




% 126.5 - 


4 


Reflective 


194.5 


23 




52 


10 


Impulsive ' 




■ 











PjQ ^ 104/8 (Actual) 



ilsives 
Reflectivea 
Unclassifieds 



7 = 

1 a 4^:6% 

1 = hMt f 



Note: 



Total = 15 = 100% 

^„.j.^lsive - at or below PtA ^ RL^^at or above on ER ' 

Reflective ^ at or above 1^^ on RL - at aor below on ER 

Unclassified = above or^beiow Pg^ on RL and ER * 

30 



The Unclassified subject (llO/lO) would be called a slow inaccurate/ The line 
separating such subjects would appear to be exceedingly find irfaeed, insofar 
as thi% particular case is concerned* / 



\ 



28 



Appendix B 



,5 
,5 



Responsp 
Laten cy 

62 

65.5 
66 
83 
84 
85, 
89. 
93 
95 
98 

99 . 
162' 
103 
109.5 
133.5 
134 
142 
147 
154 
136.5 
243 



f 


Grade 7 






n = 21 




Error 


> 

Response 


' Error 


Rate 

, > 


Latency 


Rate 


' 2 ' 


93 


13 


2 


109.5 


6 


4 " ■ 


65.5 


17 . 


■ 6. ■ • 


98 


10 


■ 7 


154 


8 


7 


83 


10^ 


8 


66 ' - ' 


8 " 


. 8 


103 


.8 • 


8 ' . , 


142 . 


8 


8 


• ' 89.5 


11, 


8 Median (P.„. 
8 , 


.102 « - . 
133.5 - 


2 
10 


9' 


1S6.5 
'243 


. 4 ■ 


10 


7,. 


10 


147 


2 ' 


■m : ^ ' 


99 


10' 


10 


95 


8 


u . * 


134 


7 


13 ' 


' 84" 


11 


85.5 . 


9 


17 . 


6'2 


8 



P50 ^ 99/8.25 (actual) 



Classif ication 

Impulsive 
Reflective 
Impulsive 
Impulsive 
Reflective 
Impulslvev » 
'Impulsive 
Reflective 
Reflective 
Impulsive \ 
Reflective 
Unclassified 
Reflective 
Reflective ' ' 
Reflective 
Impulsive 
Unclassified 
Reflective 
Impulsive 
Impulsive 
Impulsive 



Impulsives - 10 ^'67*2% 
Reflectives ^ 9 ^ 42,8% 
Uncl^sslfieds ^ 2 ^ 9.5% 

Total = 21 = a00% 



Note: The Unclaisified subjecti* (95/8j_ 113.5/10) denronitrate what appear to 
be marginal differences to .earn the label fast accurate and s] 
. inaccurate •reipectively. 



29 



Response 


Error 


latency 


■ ' Rate 


50.5 


' 3 


53 


3 


li' 


4 


57 . ^ 


4 


62 . 


. 5 


66.5 


5 


66.5 


5 


67 


5 


72.5 


5 . 


76 . 


5 


77 


6 


81 


6 


82 . 


# 6 


83 


7 


95 


7 


95.5 


7 


96 


7 M( 




7 


400 


7 


101,5. 


8 


113 - 


9 


113 


9 


119.5 


9 


130 


9 


134.5 


S 9 


149.5 


10 


155 


" 10 


158 


10 


178.5 6 


11 


180.5 


12 


184^5 


13 


189 


13 


221.5 


14 



Madian 



.Appeodix C 

Grade 8 

a = 33 ' 



Pjp = 96/7.083 Cadtual) 



Response 


^ Error 




Lateticy ' 


* ' Rate ^ 

J - - - 


Clasilf ication 


221.5 


- * 4 


Reflective 


95 


- ■ 7 


Impulsiye 


96 
^83 


3 


Reflective 


9 ' 


Impulsive * 


113 


T 


Reflective 


184.5 


5 


Reflective 


62 . 


10 


iD^illsive ' 


113 


10 


Unclassified 


.76 


6 


Uncliisified^** 


77 


- t 12 


Ii^ulslve 


57 


10 


ImpulBive 


54 


14 


Inqpulsive 


81 


•7 . 


^ Impulsive 


72.5 


6 


Unclassified 


66 .5 • 


7 


Impulsive 


101.5 ' 


- 9 


Unclassified ^ 


' 149.5 


7 


Reflective 


189 


5 


' Ref lictive 


119.5 


g. 


_ Unclassified 


" 180.5 


5 


Reflective 


82 


- 6 


Unclassified 


66.5 


5 


Unclassified 


100 


9 


Unclassified 


67 


13 


Impulsive 


158 


3 


Reflective 


134.5 


■ ^- 7 , 


Refilective 


95.3 


4 


^ Unglassif led 


130 


- 5 


Reflective 


50.5 


13 


iD^pulstve 




^ _ 9 


In^ulsive 


155 


rr -- 


Unclassified 


178.5 


8 


, Unclasified ^ 


100 . 


^; 5^ 


ftMlective 




Ifflpulgives 


- 13 ^ 39/39% 




.Raflactives 


^ 9 ^ 27,2% 


i 


UQclassifleds 


^11 "33.33% 



al ^ 33 ^100% 

Note r • ^Eyeball analysis of the Unclassified in Grade 8 reveals a pattern of 
/marginal fast accurates and slow inaccurates, 'while several subjects 
/—demonstrate more pronounced differences (e.g. , 155/11 would appear 
to be a bona fide slow inaccurate, while 101 ,5/9 would not) . .--^^ 



30 



' Appendix D . . , . 

^ 4 

" = I * 

^ ID Score Computatipii 

Directions: Use these steps with Appendix F in order to obtain the' ID Score. 
Step , • . . ' 



1. CRL column 7. 

2. £ER column 8, 



3. Divide gRl (colunin '7) bygER (colunm 8) to obtain the ID Score (column 11) 



31 



31 



Appendix E 
ID Score Sheet Directions 



Directions: To use this sheet with Appendix F complete all steps in order. 
g ^ Step numbers equate to AppendlK. F notations. - - 



step 








. ' 1 . 


Enter subject's name of record. * / 






2. 


Enter subject'^^aee. ^ / ^' 


- 'r» 




. 3.' 


Enter subject's sex. , " 


* 




4. 


Enter subject's grade, ^ 






5. 


Enter, subject' s tept date. 






, 6. 
7. 


Enter subjective examiner observation, l*e. 
or impulsive obtained during testing. 

Enter response latency tn seconds to first 


. J. classifitation as fef le 
rfeiponse^ 


cti%*e 


t 


Enter error rate as errors occur* 


K 





9. Enter error order' ks errors 'occur. 

10. Enter relevant notes as incidents occur during the test situation, 

11. Compute the ID Jcore by dividing the colmm^S total (response latency) 
by the^ column 7 total (er^or* rate) , See Appendix D*, 

12, OBtain the subject's .birth date from the records and enter it. 

13, Compute chronological ages as of the test date. 

■ . # 

14, Enter the number of years in school (this information should be 
obtained from the record and may not agree with grade placement 
due to retention), \ ' ' 

Note : Information nbted on the ID Scor^, sheet is considered minimal. The 
following specifics are added for eKplanatory .purposes i 

St^ 

6 * Subjective oysirvatioris (SO) may someday be coD^Ued and correlated in otd< 
to provide ^portant information about individual .scores and the Mrming^ 
^ system, SO ji'however^ reguires considarabla experience with test admln*^ 
* -istration tgr gain the pr6£lciency that would seem necessary to function 
' ^ effectively in this area. T^? < < 



8/ 



Although the total number bf errors pet itei is a . consideration —only 
the initial response latency , is tfsfd i^^.l^^^^^ calculation, ' . I 

^ General -examinee response latency bellivlor^^^ the ihitlal error * 



32 



10 



sBouid accdTd^gly be entered in ttfe Notes coluon (#10). This 'total 
errors - initial response latency situation exists for the double 
median split scoring procedure also, and would seem to warrant investi- 
gation. , % 

Error order infopmatiOQ 'may be used at -a latter dat&,to .provide 
valua,ble item .disctimination information in the classftal measurement 
vein. ■ 




Notes (column 10> should include, ejcplanatory and test relevant 
behavioral observation data. 



12)^13, 



14 



The ID Score is used to classify subjects as reflective qr impulsive. 
This score will have little formal value until the test has been 
replaced or revised, and normed accordingly. ' . ' 

Birth data and chronological age information should be obtained from 
the gi>»» - — mu-.^ . #. , . 



ject s record, this information may be used to provide 
chronological age cdn.trol for later, score analysis. ' ,. ' 

Years in sdBpol information is/ intended to cover retenti'ons. 
analysis- of studenta with-addltionai years in" school (retentioBs) may 
provide valuable test perfoKteance/behavioral performance data for 
this 'group. - 



er 



r 



33 



33 



^1 



Append iK F 



/_.ID Score Sheet 



Name 

^Age _ 
Age ^ 



Grade 



14 

Years' in School 



?S0 



Date 



12 



Birth Date 



13 



Chronological Age 



XteiQ 


Response 
Latency 


Error 
Rate' 


Errorn 
Order- 


score 


" Notes -° ' 




a 


0 


i * ^ - 

1 




Noisf coming 
from cafeteria 


Samj^le B 




0 • . 


J 






1* ^ Houie 

2» , Scissors* 

3 . ^hone 


13.5 .. 
10 

T8 0 


3 
0 

f 

A 
U 


3, 6, 5 

« 


1 


- - 

Said pointed - 
didnJt see 


A. Bear 


8.0 


. 5 


5, 3, 6, ' 
4, 2 




* b 


5 .. Tree . 


, 20.0 


1^ • 


6 






6. Leaf 




d. 








7. Cat 


12.5 


0 


■5,1 






8. . Dress 


18.0 


4 


2, 3, 1, 4 






9. Firaffe 


15.0 


'2 


6, 2 






- 10* LaB^ 


10.5 ) 


2 


.1, 6 






11, Boat 


15.. 5 


0 








J2* ' Cowboy 


11. 0 


a 


3, 1 ' 


3 


Sandom 
directional 


Total 


156.0 


22 






Unclassified 
X doiAle- r ; 
median 


vf^ipD icore . f 








7.09 

IT ' 


.■ ' ' '■ ■=y r '■ '' ''''' 




■-■Cm 



34 



Appendix G ^ 

^JSuggested ID Norming 
' ^- MStrix 



Note: 



\ 



Acquiiition oi the following informatioa shoUld provide effecti^ite test 
norms; The test to be nonned, ^owever, must be sound prior to\ukder- 



"Grade ^ 
Sex 

Subject n 



taking the norming process 



Sep tembe r/feb ruary^ 
2 - 3 4 5 .6 



8 



1. 
2. 
3. 

4. 
5. 

•6. 

7. ■ 

8, 

9., 

10. 
11. 

12. 

13. 



.Tage (total) 
_X^ge Cretentions) 
K age (regulari> 

XIQ (total 
X I<j (retentions) 
X IQ (regulars) 

Response Time 'F'R (total) 
. Responsfi Time X R (retentions) 
^Response Tfme X R (regulars) 

Error RateTR (total) 
Error Rate X R (rettntiops) 
Error Rate X R (regulars) 

ID Score Information for 1-6 
preceding CX's, R s, SB's, SEM's) 



MF MF MF M^F\MF MFMFj.MF 



X suburban subjects. . 
X rural subjects 
X inner-city subjects 
x race 

"x cross-natfoual gaaups 
X retentions 



■0 



Additional T.nformatioii would be useful. This Information toight include redding 
t|st scores (vocabulary/ comprehension/total) for the specific 'group i obtained 
by a concurrent administration, e.g. •* T 

a b - • ' - ■ " ' / • j 

' NortBS for a September and Fefcrua^ admlnlstratlbn would greatly improve the ° 
utility of the data as well as the data inteq)retabllity process. • Reliablli- " 
ties would automatically follow from such a noming procedure.' It appears ' 
that.a grade level n of approximately 400 subjects (200 boys/200 girls) would 
be necessary to soundly undertake the no rmlng process for a specific popula- 
tion segment,, such as rural subjects, with a September/February aflmlnistration. 



ERIC 



35 



35 



^ ' References - 

Aull^, Httchell, and Hartmannj^ D.P.'^Some methodologicai problems 
'Mn reflection-impulsivity research. Child Development. 1976, 47 
227-231, ' ~ ' ' 

Block, J,, Brock, H.H,, and Harrington, D, Some misgivings about the Matching 
Familiar Figures Test as a measure of reflection-impulsivity/ Develop- 
. mental Psycfaol&gy . Vol, 10, No. 5, 611-632. / 

^ Cattail, R.B.^ Measflrement versus Intuition in applied psychology. Character 
a nd Personality . 1937^ 6,-114-131 , / ' ~~ 

EgeUnd, B., WeiDber|, R. The Matching Familiar Figures Tesfci A look at its 
psychometric cpedibility. Child Developinent . 1976, 47, 483-491 

Ifall, V.C., an.d Russell, W.J.C. MultitraitTmultimethod analysis of conceptual 
tempo. Journal of Educational Psychology . 1974. 'Vol. r 68. lfo.|6, 932-939. 



Hartshorne, Ht, May, M?A. , and Mailer, J.B. Studies in s ervice and self-control 
, New York: The MacMillan Company. . 1929 r . . 

Kagan-,^J., Rosman, B.L. , Day, D., Albert, J, , and Phillips, W. Information 
processing in the child: Significance of analytie and reflective 
4 attitudes. • Psychologicar Monographs . 1964, 78 (1, Whole No. 57S).* 

Kagan, J. Impulsive and reflective childrent Significance of ^conceptual 
tempo. In J.D. Krumboltz (Ed.), Learnine and the Educa tional Process 
Chicago: Rand McNally, 1965, 133-1517 ^ — ^. 

Messer, S, Ref lection-Impulsivity i St^ability and school failure. Journal 

of Educational' Psycholog y, 61, ,DeQember 1970, 487-498* ~~ ^ 

Murray, H*A,, Explorations in personality . Oxford University Press, 1938. 

Polanaky, N., Lippitt; R., and RedlVF. An investigation o5 behavioral contagion 
^ in groups. Human Relations: Studies towards' the integra tion of the social 
1 sciences . Vol. Ill, 4, 319-348. , — 

Salkind, N.J. The develdpment of norms for the Matching Familiar Figures Test. 
University of Kansas, paper. Lawrence, Kansas, 1977. 

Salkind, N.J., and Wright, J.C. developihent of reflection-inipulsivityand 
cognitive efficiency; An integrated model. Human •Development . 1977. 
20, 377-387. . — ■ ^ ' * ' 

Sutton-Smith, "B.S and Rosenberg, B.C. A scale to identify' impulsive 'behavior 
in children.. The Journal of Genetic Psychology . 1959. Vol. 95, 211-216. 

Thorndike, R. aad Hagen, E. Meaaurement and" evaluation, in psycholQiy and 
education, 4th edition . John Wiley & Sons, 1977, 93-94 - 

' . .. ■ ' • ■ • ■ * . • • ■ 



